This analysis looks to provide some insights on the financial information behind some of the biggest transit agencies in the US. The data set being studied includes massive systems like the NYC Metro and also lesser used systems in states like Hawaii and Oklahoma.
Task One & Two: Data Cleaning
After loading the data certain column names and row values need to be changed for syntactic reasons and ease of understanding. Firstly, changing the column UZA Name to metro_area will allow for syntactic ease, and a more intuitive label.
# A tibble: 1,000 × 8
`NTD ID` Agency metro_area Mode `3 Mode` month UPT VRM
<int> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 30088 County Commissioners … Waldorf, … MB Bus 2003… 3.02e4 6.64e4
2 50037 County of Muskegon Muskegon-… MB Bus 2009… 4.94e4 3.34e4
3 80004 City of Billings Billings,… DR Bus 2022… 3.61e3 1.53e4
4 60107 Texoma Area Paratrans… Sherman--… DR Bus 2017… 2.53e3 2.95e4
5 30034 Maryland Transit Admi… Baltimore… CR Rail 2018… 8.34e5 5.57e5
6 20122 Academy Lines, Inc. New York-… CB Bus 2014… 3.39e5 7.19e5
7 40105 Puerto Rico Highway a… San Juan,… PB Bus 2012… 2.77e6 2.26e6
8 30078 Southwestern Pennsylv… Pittsburg… VP Bus 2013… 2.22e4 8.09e4
9 40053 Greenville Transit Au… Greenvill… DR Bus 2024… 1.24e3 1.05e4
10 50032 Mass Transportation A… Flint, MI DR Bus 2019… 4.80e4 6.75e5
# ℹ 990 more rows
The second section will be “re-coding” the Mode date in the dataframe. Since the data originally had codes like “HR” for Heavy Rail, or “FB” for Ferry Boat it will be hard for someone ignorant of these codes to understand the meaning behind them. So we will find all the distinct values, search their meanings, and then rename those values. The results will be displayed a Data Table using the DT library
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html